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1. INTRODUCTION 

Pneumonia illness is a lung infection caused by microbes such as bacteria or viruses. The 
pneumonia infection creates inflammation in the lung which results in breathing difficulty and sometimes 
death [1]. In 2019, a new type of pneumonia caused by the SARS-CoV-2 virus and which is called Covid-19 
has spread throughout the world and tumed into a pandemic [2]. At the early stages of the pandemic, the 
etiology was unknown which made it dangerous to people without medical profession as well as to the health 
workers alike. The spread of covid-19 has paralyzed the medical systems in most of the developed countries, 
due to its novelty and lack of medical procedures to tackle it, and the infection of health workers [3]. 

A computer-aided diagnosis system (CADx) is the utilization of the output of algorithms 
implemented on a computing device to assist radiologists in diagnosing an illness. The CADx output is used 
as a second opinion that complements that of the radiologist and not as a replacement [4]. The availability of 
a CADx system can help in reducing the contact time between the patients and the radiologist which reduces 
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the probability of health workers getting infections. Moreover, the CADx can help reduce the uncertainty 
when making diagnosis decisions by the radiologist as well as in medical education. 

The research of convolutional neural network (CNN) architecture led to state-of-the-art models in 
multi-class classification of generic images. The benchmark was the classifying of the ImageNet 
database [4]. However, the deep neural network model requires an enormous amount of data and 
computational power to train such a model. Nevertheless, the main characteristic of the medical images 
datasets is that they are limited in size compared to ImageNet; Which led to the use of these state -of-the-art 
models as a pre-trained feature extractor [5]. Researchers have used the transfer learning models to classify 
X-ray images to identify COVID-19 infection [6]-[12] or pneumonia in general [5], [13]-[15], 
Cardiomegaly [16], osteoarthritis [17], Breast cancer [18], skin cancer [19], [20], tuberculosis detection [21], 
and disease-free chest [22]. Popular state-of-the-art models such as NASNet [23], ResNet101/152 [24], 
InceptionResNetV2 [25], and Xception [26] have been used as transfer learning models [12], [13], [18], [27] 
based on the hypothesis that the ImageNet features can be generalized. However, Komblith et al. [28] find 
that the ImageNet features extractors do not generate well-discriminating features for the classification of a 
fine-grained dataset. 

In this paper, we propose a CADx system for binary classification of chest X-ray images into two 
classes: Pneumonia and Normal. The proposed model contains a residual connection, two paths of 
convolutional layers, and multiple filters of different sizes which allow the model to extract rich 
discriminating features. The organization of the paper is: Section 2 describes the dataset used to train and 
evaluate the proposed technique. Section 3 describes the proposed model. Section 4 presents the performance 
evaluation, followed by the conclusion in Section 5. 


2. MATERIAL AND METHOD 
2.1. Chest X-ray dataset 

The dataset used in this research contains 5856 validated chest X-ray images depicting pneumonia 
and normal cases which were collected by Kermany et al. [5] and publicly shared. These images are grouped 
into two groups which are the training - and the validation group. Table | shows the distribution of the images 
in this database and Figure | shows samples from this database. 


Table 1. Number of images for each class in the dataset 


Class oe subset ag 
Training Validation 

Normal 1349 234 

Pneumonia 3883 390 

Total 5232 624 


— rs 
Figure 1. Sample images from the chest x-ray dataset. (the images on the first row represent normal lungs. 


The images on the second row represent infected lungs) 


Int J Artif Intell, Vol. 11, No. 4, December 2022: 1469-1477 


Int J Artif Intell ISSN: 2252-8938 Oo 1471 


The scarcity of medical images or the scarcity of images belonging to a specific class in the dataset 
may affect the training process enormously by biasing the model’s weight toward the class with the largest 
data as shown in Table 1, which results in skewed classification. Hence, a cost-sensitive learning approach is 
implemented in this research to deal with this problem. The cost-sensitive leaming approach assigns a high 
cost for misclassification of the minority class while the majority class has less misclassification cost, hence, 
adjust the models’ weights in a way that pays more attention to the minority class than to the majority 
class [29]. This approach didn’t discard any existing images from the dataset as well as didn’t generate any 
unreal data that didn’t represent the actual pictorial information. The weights are calculated depending on the 
number of images for each class in the training set as in (1). 


1 Na 
=— x 

We = TKS (1) 

where w, is the weight for class c, N, is the number of images in class c, N, the total number of images in 

the dataset, and k is the number of the classes. 


2.2. The Proposed CNN model 

The proposed CNN model as shown in Figure 2, extracts three different types of features which are 
later concatenated into one rich feature with satisfactory classes representation. These features are extracted 
using three paths named A, B, C. In path A, the fine features are extracted as there is no max-pooling layer to 
perform down-sampling. In path B several max-pooling operations will be applied to the input of this path in 
sequence; In total, a sharp feature will be extracted. Moreover, path C extracts basic features which is the 
output from a single convolutional layer. In this model, there are four convolution layers in path A and, three 
in path B, and one at the beginning. In addition, a skip connection (path C) has been added to the proposed 
model to minimize the gradient vanishing impact. There are seven types of layers in the proposed model 
architecture and the details of these layers are explained in the following: 

a. The scaling layer is used to scale the pixel values of the input image in the range [-1 to 1]. After 
rescaling the input, a dropout rate of 0.3 is applied to expose the model to all the features in the image 
and prevent it from considering noise as a feature. 

b. The convolutional layer is used to apply a spatial convolutional operation on the input image with 
several filters which result in generating multiple feature maps. Different sizes of filters are used to 
capture most of the features in the original image independently from their size. A non-linear activation 
function is applied to the output of the convolutional operation to introduce non-linearity to capture 
non-linear features. In this architecture, different activation functions are used, such as Tanh, Elu, 
Swish, Relu, and Selu [30], [31]. 

c. The batch Normalization layer is used to normalize the convolutional layers outcome to make the 
feature maps have unity variance and zero mean. The batch normalization layer can stabilize the 
learning process and reach convergence faster [32]. The batch normalization enables also reduce the 
initialization effect on the model weights. 

d. The Max-Pooling layer is used to downsample the image and keep the sharp features only. Three max- 
pooling operations with a size of 2x2 and stride of 2x2 are used in path B. 

e. The global Average Pooling layer is used to create one feature map for each class by averaging the 
corresponding feature maps. 

f. |The concatenation layer combines the final feature map from the global average pooling in paths A, B, 
and C. The concatenated features will represent a singular large feature that discriminates pneumonia 
from the normal class. 

g. The Fully Connected layer is used to classify the concatenated features into pneumonia and normal 
classes depending on the threshold of the activation function. Moreover, there is a dropout rate of 0.3 is 
applied to the fully connected layer. The dropout introduces sparsity to the activation of the hidden 
neurons, 1e., sparse representation of the data and preventing the neural network from overfitting to the 
training data. There is no image augmentation layer in the model as it’s tested in this research that it’s 
negatively affected the performance of the models. Moreover, the use of a combination of image 
augmentation techniques is previously reported to increase the overfitting in very limited datasets [33]. 


3. RESULTS AND DISCUSSION 

The metrics used in the experimental analysis are divided into metrics evaluating the performance of 
the model such as recall, precision, fl-score, Kappa statistics, and heat map and metrics used to evaluate the 
complexity of the model such as the total number of parameters, model size, and the testing time. The 
training and testing of the models were conducted on windows 10 using NVIDIA GeForce® GIX 1660 Ti 
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GPU, intel core 17—-10750 2.60 GHz CPU, and 16GB of installed ram. Due to the random weight 
initialization in the artificial neural network, the final result usually differs each time the model is trained. 
Hence, we trained and tested each of the transfer-learning CNN models as well as the proposed model five 
times and the resulting average is calculated. The number of training epochs is 6, the batch size is 8, and the 
images are down-sampled to 160 x 160 before being feed into the models. 

When designing the model, we explored a wide range of combinations of hyper-parameters against 
the evaluation accuracy by using Bayesian optimization with the Gaussian process. The Bayesian 
optimization has been used due to its performance compared to other algorithms such as grid search, random 
search [34]. The tuned hyper-parameters include activation functions, filters numbers and their sizes in each 
convolutional layer, learning rate, optimization function choice. The final hyper-parameters after tunning for 
the activation functions and the number of filters as well as the filter size are shown in Figure 2. Moreover, 
the learning rate is tuned to 0.001 and adaptive moment estimation (Adam) was chosen as the optimization 
function. The transfer learning models are used as a feature extractor while the classifier consists of a global 
average pooling layer, batch normalization, and a fully connected layer with dropout regularization. 
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Figure 2. The proposed model 


3.1. Recall, precision, and F-beta score 

In the ideal situation, the input data when fed into a trained classification model, the output is the 
actual class of the input. In this scenario, the prediction of X-ray image of pneumonia infected class is called 
true positive (TP) while the prediction for a normal case is called the true negative (TN). However, in the 
real-case scenarios, this outcome isn’t guaranteed for several reasons such as noisy data or weak handling of 
the features by the model, and this can produces a false positive (FP) and false negative (FN). The FN, which 
is a pneumonia case classified as normal, has a severe impact on the patient's health because that it will result 
in preventing or delaying the treatment. On the other side, FP, is a normal case classified as pneumonia, 
raises the medical cost and workforce. The relation between the TP and FP is called the precision and can be 
calculated using (2), high precision means low FP. 


TP 
TP+FP 


Precision = 


(2) 


while the relation between the TP and FN is called the recall which can be calculated by (3), high recall 
means low EN. 
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TP 
TP+FN 


Recall = 


(3) 


The precision and recall can be used to measure the performance of the model when either the cost 
of FP or FN is critical consequently. In the case of diagnosis of an illness, the higher the recall is the better as 
the consequences are more damaging to the patient. Practically, it’s hard to get a model that has both high 
precision and recall at the same time. The F-beta score combines the precision and recall in a weighted 
harmonic mean as shown in (4). 


Precision xRecall 


f—Beta = (1 + B?) (4) 


* (B2xPrecision )+ Recall 


The weight is chosen as 8B = 1 when both the precision and the recall are important. However, if we 
want to favor precision over the recall, then the B <1 and otherwise B > 1. As aforementioned, the recall 
should be more important than the precision for the pneumonia class. Table 2 shows the performance of the 
proposed model along with the transfer leaming models. As mentioned earlier, each test is conducted five 
times and the average is calculated as the final score along with the standard deviation (std). 

A successful model in the case of classifying pneumonia images should have a high recall and 
relatively high precision. From Table 2, we can see that the Resnet101 showed the highest recall score and a 
low precision score for the pneumonia class. Hence, it’s deemed an impractical model. While the proposed 
model has a combination of high scores for both the recall and precision for the pneumonia class. W hich 
translated to a better F-beta score for all the values of 8 compared to the other models. The results highlight 
the importance of training the feature extractor on the fine-grained dataset as the chest X-ray images. The 
proposed model has learned a discerning feature that discriminates normal from infected lungs images which 
didn’t exist in a generic dataset like ImageNet. 


Table 2. Precision, recall, and F-beta score results 
f — Beta 
Model Class Precision Recall B =0.5 B=1 B=2 


InceptionResNetv2 Normal 0.71240.025 0.86420.009 0.74 0.78 0.83 
DEP HON NES NE Pneumonia 0.904+0.004 0.79+0.004 0.88 0.84 0.81 
Normal 0.844+0.042 0.784+0.095 0.83 0.81 0.80 


BNSTey ates Pneumonia 0.878+0.045 0.908+0.041 0.88 0.89 0.90 

RENCE Normal 0.87440.021 0.70240.058 0.83 0.78 0.73 

Pneumonia 0.844+0.024 0.938+0.012 0.86 0.89 0.92 

Normal 0.868+0.039 0.80440.077 0.85 0.83 0.82 

ResNetis2v2 Pneumonia 0.89+0.038  0.924+0.036 0.90 0.91 0.92 

; Normal 0.838+0.031 0.82+0.036 0.83 0.83 0.82 
Xception 


Pneumonia 0.894+0.018 0.902+40.025 0.90 0.90 0.90 
Normal 0.866+40.050  0.902+0.056 0.87 0.88 0.89 
Pneumonia 0.942+40.029  0.912+0.039 0.94 0.93 0.92 
Normal 0.96 0.91 0.95 0.93 0.92 
Pneumonia 0.95 0.98 0.96 0.96 0.97 


Proposed model 


Radiologist 


3.2. Kappa statistics 

Cohen’s kappa statistic (interrater reliability) is a measurement to test the agreement between 
different raters given the same data [35]. Despite using the same data, the used CNN models in this study 
have different feature extraction methods which result in a different classification. Hence, the kappa statistic 
is used to determine the interrater reliability between the radiologist and the CNN models. The kappa statistic 
can be calculated as in (5). The range for the kappa statistic is from -1 to +1. The interpretations of Kappa 
score are: when the score <0 then there is no agreement between the raters; when the score is between 0.01 
and 0.20 then there is a slight agreement. If the score is between 0.21 and 0.40, then there is a fair agreement; 
the score between 0.41 and 0.60 shows a moderate agreement. A score between 0.61 and 0.80 means 
substantial agreement; finally, a score between 0.81 to 1.00 means almost perfect agreement. Table 3 shows 
Cohen’s kappa statistic for the CNN models. 


i Pr(a)—Pr(e) 
= Pr(e) 


(5) 


where Pr(a) is the probability of agreement, and Pr(e) is the expected agreement. 
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Table 3. Kappa statistic 
Model Kappa statistics 

InceptionResNetV2 0.583 +0.030 
NASNetLarge 0.648+0.042 
ResNet 101 V2 0.614+0.029 
ResNet 1522 0.688+0.050 
Xception 0.676+0.033 
Proposed model 0.740+0.008 


From Table 3, we can see that most of the models have a substantial agreement between the 
radiologist assessment and the models’ predictions. However, the proposed model has the highest kappa 
score with the least standard deviations. The kappa statistic result might enforce the idea that the proposed 
model has learned medical diagnostic features. 


3.3. Class activation map (CAM) 

The class activation map (CAM) is a helpful tool to find what features in the image that has the 
highest impact on the prediction of image class. Hence, the CAM can be used to validate that the model is 
picking the right underlying pattern for each class. If the model didn’t pick the right patterns, the training of 
the model should be revised. However, CAM has a drawback which is the requirement for changing the CNN 
model and dropping the fully connected layers. Selvaraju et al. [36] proposed a generalization to CAM called 
the gradient-weighted class activation map (Grad-CAM). The Grad-CAM didn’t require architecture 
modification as well as it is applicable for a wide range of CNN-models families. The Grad-CAM uses the 
gradient information that feeds into the last convolutional layer of the CNN to visualize the importance of the 
image’s parts for the classification of the class at hand. 

Figures 3 and 4 show the X-rays images and the corresponding Grad-CAM. Note that in the normal 
cases on X-ray images, the lungs appear with a dark shade, whereas the spine appears with white shade; This 
is because the air in the lungs has smaller attenuation compared with bones in the spine. Generally, 
radiologists diagnose pneumonia when there is a loss-of-silhouette sign [37] which is the loss of the heart 
borders with the adjacent lungs segments. It is observed that the normal and pneumonia Grad -CAMs can 
highlight the medial part of the X-ray images, including lungs and part of the spine. However, The 
pneumonia Grad-CAMs show a significant loss-of-silhouette sign in X-ray images. 


Figure 3. Normal images. first row: chest xray images, second row: Corresponding heat map 


3.4. Model complexity 

A model with a low level of complexity is more suitable to be deployed in real-time CADx as well 
as deployment on devices with low computational capabilities and storage size. The main factor that affects 
the model complexity is the number of model parameters. Model parameters are the variables that are tuned 
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throughout the training of the artificial neural network training to enhance the prediction of the model. The 
higher the number of the parameter the more computations are required in both training and testing. The time 
required for training and testing is the by-product of the number of parameters that the model has. The 
proposed model has a very small size compared to the transfer learning models as can be seen in Table 4. 
This size will also affect the time of testing chest X-ray images, as the number of addition-multiplication 
operations will decrease. The test time in Table 4 is done fora batch of size 32. 


Figure 4. Pneumonia images. first row: Chest xray images, second row: corresponding heat map 


Table 4. Models’ complexity comparison 


Model Total#ofparameters Size (inKB) Time in Sec) 
InceptionResNet V2 54,344,417 214,377 2.99+0.05 
NASNetLarge 84,936,979 334,750 4.44+0.11 
ResNet 101 V2 42,636,801 167,619 1.79+0.08 
ResNet 152V2 58,341,889 229,467 2.41+0.06 
Xception 20,871,721 81,992 1.19+0.06 
Proposed model 2,337,057 27,548 0.87+0.03 


4. CONCLUSION 

In this paper, we proposed a new CNN model which tested on pneumonia binary classification, and 
the results were verified using recall, precision, B-score, Kappa statistics. The model is more suitable for 
real-time CADx as it has lower complexity compared to transfer learning models. Unlike the transfer learning 
models which rely on generic features, the proposed model was trained to extract a fine-grained feature in the 
dataset. The extracted features gave the proposed model a lead in the test scores as it effectively represents 
the Pneumonia and normal classes. The extracted features are proved to be correct by observing the Grad - 
CAM that is extracted from the advanced convolutional layers in the model. 
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